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Abstract. The problem of estimating small transition probabilities for overdamped 
Langevin dynamics is considered. A simplification of Girsanov's formula is obtained in 
which the relationship between the infinitesimal generator of the underlying diffusion 
and the change of probability measure corresponding to a change in the potential energy 
is made explicit. From this formula an asymptotic expression for transition probability 
densities is derived. Separately the problem of estimating the probability that a small 
noise Langevin process excapes a potential well is discussed. 



1. Introduction 

Let X t be a stochastic process in M. d satisfying the stochastic differential equation 

dX t = -WpQ dt + v/2/3" 1 dW t (f) 

This is the overdamped Langevin equation. Formally, X t is a time homogeneous Ito pro- 
cess pL] with conservative drift and constant diffusion. Intuitively, X t represents the dy- 
namics of large particles interacting through the potential energy V, with additional "ran- 
dom" motion driven by collisions with many small particles. The overdamped Langevin 
equation can be viewed as a simplification of the well-known (second order) Langevin 
equation, which models the dynamics of a system of particles in contact with a heat 
bath at positive temperature T = (ksP)" 1 . The overdamped version is obtained from 
a scaling limit of the Langevin equation in which a damping constant tends to infinity 
[2], [3]. The overdamped Langevin equation can then be viewed as approximating the 
high friction limit of the Langevin equation, in which no acceleration takes place. In this 
paper small transition probabilities on the process ([T]) are considered. 

A useful estimate of a small probability should have an error which is much smaller 
than the probability itself. Unfortunately, standard Monte Carlo sampling techniques 
are often not useful in this sense. This is because for a fixed number of samples, as 
the probability p being estimated approaches zero, the variance of the standard Monte 
Carlo estimate of p is nearly proportional to p. The error, represented by the standard 
deviation, is then nearly proportional to ^Jp » p. 

Small probabilities of the process ([1]) have been studied in the large [3 limit in the 
context of Freidlin-Wentzell theory [3]. In particular, the asymptotic behavior of prob- 
abilities as (3 — > oo satisfy a large deviations principle (LDP) [5]. Though the LDP 
by itself says nothing about probabilities at a fixed (3, the Freidlin-Wentzell theory has 



Date: April 2012. 

Key words and phrases. Stochastic differential equation, stochastic process, Ito process, Langevin 
equation, overdamped Langevin equation, Brownian dynamics, transition probabilities, importance sam- 
pling, Monte Carlo, small noise diffusion. 



1 



recently been used in conjunction with optimal control theory to construct Monte Carlo 
importance sampling schemes that are asymptotically optimal (as — > oo) in various 
senses [6], [TJ, [8]. Such schemes reduce the variance of standard Monte Carlo estimates 
by sampling with a measure under which the relevant event is more probable; samples are 
then multiplied by an appropriate factor depending on this measure. In general asymp- 
totically optimal schemes of this sort are adaptive, with an evolving change of measure 
requiring significant computation at each time step. By contrast, non- adaptive schemes, 
for which the change of measure is fixed and impact on computation time is negligible, 
generally are not asymptotically optimal (see, however, [5]). 

Introduced below is a non-adaptive importance sampling scheme for estimating the 
probability that a Langevin process escapes a potential well in the large (3 regime. Though 
the analysis here is restriced to the overdamped case ([1]), the scheme can equally be used 
with the second order Langevin equation. It is shown to be asymptotically optimal in 
certain cases, and to exhibit very good (if not optimal) performance more generally. Esti- 
mates on its effectiveness at finite /3 and asymptotically as (3 — > oo are given. Separately, 
an asymptotic expansion for transition probability densities as t — > is proved. 

The organization of the paper is as follows. Background and notation are discussed 
and a change in measure formula is proved in Section [2] below. In Section [3] an asymptotic 
expression for transition probabilities is proved. In Section H] importance sampling and 
the problem of estimating the probability that the process ([T]) has exited a potential well 
are discussed. In Section [5] a one-dimensional numerical example is provided. 

2. Background, notation and change of measure 

Here the well-known relationship between stochastic differential equations (SDEs) and 
partial differential equations (PDEs) is briefly reviewed. The discussion here is focused 
on the Langevin SDE 

dX t = -VV(X t ) dt + y^f^dWt (2) 

Here W t is a <i-dimensional Wiener process, and V : M. d — > R is called the potential. 
Throughout it is assumed that V G C^(R d ); that is, V is bounded together with its (con- 
tinuous) first and second order partial derivatives. Under these conditions (T5]) has unique 
strong solutions for every initial condition as well as transition probability densities jTD] - 
The Langevin SDE has infinitesimal generator Ly defined by 

t^fO t 

for / G C^(M d ). Here E x denotes expectation with respect to the initial condition X = x. 
From Ito's lemma and the dominated convergence theorem, one finds that 

L v = -W • V + /T 1 A (3) 

The operator Ly is closely related to probabilities of the process (j2D- In particular, let 
Pt{x, y) be the probability density that X t = y given that X = x. (By the Markov 
property of the process ([2j) this determines all the transition probability densities.) If 
the second order partial derivatives of V are all Lipschitz continuous, then for fixed x, 
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p t (x,y) satisfies the PDE 



d 

—Pt(x,y) = L v p t (x,y) 



(4) 



This is the Fokker-Planck equation [II] . Here the operator 



Ly = V • (W-) + /T X A 

is formally adjoint to Ly and in (j3J) is assumed to act only on the the ^/-component of 
p t (x,y). In principle by numerically solving the Fokker-Planck equation one obtains the 
transition probability densities, but this is impractical when the dimension d is large. 
Let P be the reference probability measure under which X t satisfies 

dX t = -VV(X t ) dt + 

One might ask how the measure P changes if V is replaced by another potential V. In 
general this question is answered by Girsanov's theorem [IT], [12]. However, the special 
structure of the overdamped Langevin equation allows for a useful simplification to the 
well-known Girsanov formula. In fact in Theorem 12.11 below it is shown that the change 
in probability measure has a simple relationship with the infinitesimal generators Ly and 



Theorem 2.1. Assume V, V G Cf(M d ) 



Let P be the reference measure under which X t 



satisfies 



dX t = -VV(X t ) dt + a dW t , X = x , < t < T 



where a = w2[3~ 1 and W t is a d-dimensional Wiener process. Define P by 



dF 



cxp 



a 



V(x ) - V(X T ) - [V(x ) - V(X T ) 
\ J T \{L V + L )V(X S ) - {L v + L )V(X S 



ds 



(5) 



where L. is given by ([3]). Then under P, X t satisfies 

dX t = -W{X t ) dt + a dW t , X = x , < t < T 
where W t is a d-dimensional F-Wiener process. 

Remark 2.2. In the above, L = /3 _1 A. 



Proof. Let U = V - V, and define P by 



dF 



exp 



a 



VU{X S ) ■ dW s 



a 



-2 rT 



o 



VU(X S ) ■ VU(X S ) ds 



By Girsanov's theorem, under P the process X t satisfies 

dX t = -W{X t ) dt + adW t , X = x , t G [0, T] 
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(6) 



By Ito's lemma, under P, 



1 



dU(X s ) = VU(X S ) ■ dX s + ^dX' s V 2 U{X S ) dX s 



(7) 



a' 



= -VU(X S ) ■ VV(X S ) ds + aVU(X s ) ■ dW s + —AU(X S ) ds (8) 

where V 2 U denotes the Hessian matrix of U, and dX' s is the transpose of dX s . Rear- 
ranging, multiplying by a~ 2 , and using the integral form of ©-flE]), this becomes 

f-T 



a' 1 / VU(X S ) ■ dW s 
Jo 

= a- 2 [U(X T )-U(x )]+a- 



VU(X S ) ■ VV{X S ) ds 



(9) 



AU{X S ) ds (10) 



Substituting (l9l)-(|T0l) into (J6j) and simplifying, 



dF 
dF 



exp 



-l 



a 



T 



VU{X S ) ■ dW s - 



a 



-2 pT 



VU(X S ) ■ VU(X S ) ds 



exp 



a' 2 (u{X T ) - U(x ) + \J^ (w • - W ■ W - a 2 A*/) (X s ) ds 
By comparing (TTTj) - (fl2T) with ([5]), the result follows. 



(12) 
□ 



Although Girsanov's formula and Ito's lemma can be used with any Ito process [TT] . 
in the proof of Theorem 12. II the assumptions that the change in drift (here VV^ — W) is 
conservative and that the diffusion matrix (here aid) is a constant multiple of the identity 
matrix are essential. The result can be generalized slightly: 

Theorem 2.3. Assume V E C 2 (M. d ) and F : M. d — > M. d is Lipschitz continuous. Let F re f 
be the reference measure under which X t satisfies 

dX t = F(X t ) dt + a dW[ ef , X = x , < t < T 

where W[ e ^ is a d-dimensional Wiener process and a > 0. Define P by 



dJ 



dFref 



exp 



o- 2 V(x ) 



V(X T ) + i jT (L v -L + 2L r£ f) V(X S ) ds\ 



where 



L re f = F . v + 



is the infinitesimal generator of the reference process. Then under P ; X t satisfies 

dX t = -VV(X t )dt + F(X t )dt + adW tl X = x Q , 0<t<T 
where W t is a d-dimensional P- Wiener process. 



The proof of Theorem 12.31 is similar to that of Theorem 12.11 and is therefore omitted. 

Note that much intuition can be gained out of a simple inspection of the formula (J5J). 
For example if T, 5 are small and 



A={X. : X T eB s (y)} 
where Bg(y) is a ball of radius S around y, then 

¥(A) « exp \a- 2 (V(x ) - V(y) - \v(x ) - V(y) ) ¥(A) 



(13) 



In particular, if V = then the probability on the right hand side of (TT3T) can be written 
as an integral of a Gaussian density; this suggests an estimate of asymptotic transition 
probabilities which is pursued in the next section. 



3. Asymptotic transition probabilities 
Consider the transition probability density p t (x,y) of the process 

dX t = -W(X t ) dt + a dW t 



(14) 



Recall p t (x,y) is the conditional probability density that X t = y given that X = x. 
Notice that if V = in flUD then X t = aW t and 



p t (x,y) = (2na 2 (t-s))- d / 2 



exp 



\y - x\ 



2a 2 (t - s) 



In the following theorem Theorem 12 .11 is used to estimate transition probability densities 
for a generic potential. 

Theorem 3.1. Assume V G C 2 (M. d ) and (Lv+Lq)V is Lipchitz continuous with Lipschitz 
constant K. Let p t (x,y) be the transition probability density of the process (fl4"l) . Define 



[1 — r)x + ry 



Then for any 5 > 
Pt(x,y) > 
exp 



-2 



a 



V(x) - V(y) + - I (L v + L )V(ip x>y (r)) dr - M x 5t 



M 2 j(5,t) Pt (y - x) 



and 

Pt(x,y) < 



exp 



a' 2 [ V(x) - V(y) + (L v + L )V(^ y (r)) dr + M 1 5t 



M 2l (S,t))p t (y-x) 



where 



M l = -VdK 
2 

M 2 = 2d exp (a' 2 
7(5, t) = exp 



V(x) - V(y) + - sup\(L v + Lq)V\ 



2P 

pt(x) = (2na 2 t)- d/2 exp 



x 



2oH 



In particular, as t — > + , 
Pt(x,y) = 



(15) 



exp 



a' 2 V(x) - V(y) + - (L v + L )V(if> x , v (r)) dr 



+ 0(r+ 1 ))p t (y-x) 



(16) 



for any a G (0, |) . 



Remark 3.2. Note that p t (y — x) is the transition probability density of the process aW t - 



Theorem 13.11 can be seen a a first-order correction to transition probability densities 
when a conservative drift is added to the process 

dX t = adW t (17) 

as in ( IT^l) . The term in parentheses in ( |T6|) gives the correction corresponding to the 
addition of the drift — W. Note that when t is small, the correction is dominated by 
the term exp[cr~ 2 (V (x) — V(y))], which depends only on the change in potential energy 
from x to y. 

Using Theorem l2.3[ the asymptotic result of Theorem 13 . 1 1 can be generalized as follows: 

Theorem 3.3. Let L re f and p r t e ^ \x , y) be the infinitesimal generator and transition prob- 
ability density of the reference process 

dX t = F(X t )dt + adW t (18) 

where W t is a d- dimensional Wiener process and F : R d -> R d is Lipschitz continuous. 
Assume V G C 2 (M. d ) and {Ly — L + 2L re ^)V is bounded and Lipchitz continuous. Let 
p t (x,y) be the transition probability density of the process 

dX t = -VV(X t )dt + F(X t )dt + adW t (19) 

and define 

^Px,y(r) = (1 - r)x + ry 
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Then as t — > + ; 
Pt(x,y) = 



exp 



a 



V(x) - V(y) + IJ {L v -L + 2U e f) V{^ y {r)) dr 



+ 0(f* +1 ) )p r t ef (x,y) 



for any a G (0, |) . 



The proof of Theorem 13.11 is a consequence of Theorem 12.11 and the following lemmas: 
Lemma 3.4. Let P be the measure under which X t satisfies 

dX t = a dW t , X = x 

where W t is a d- dimensional Wiener process. Fix t > 0, x = (xi, x 2 , ■■■,Xd) G ~R d , 
V = (yu V2, -, Vd) e and 5 > 0. Define 



a 

N^(r) = JJ ((l - x k + ~y k - 6, (l - x k + T -y k + 5 



k=l 



Then 



Proof. With the /cth component of X t , a well-known formula of Siegmund (|13j. 
leads to 

P (x r fc < (i-l} Xk + Ly h + 5, 0<r<t\X* = y k ) 



1 ~ eX P ( ~~2l 



= P(X r fe > (l- r -^ Xk+ r -y k -8, Q<r<t\X^ = y k ) 
The result follows by subadditivity. 



□ 



Lemma 3.5. Assume G : R d — > R is Lipschitz continuous with Lipschitz constant Kq. 
Fix t > 0, x, y G R d and 5 > 0. If X r G N%?(r) for all r G [0, t\ then 



[ G(X r )dr-t [ - r)x + ry) dr 

Jo Jo 



< Vd5K G t 
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Proof. Note that if X r G N*f(r) for all r G [0,t], then 

jf c ™*-jf G (( 1 -T) I+ T)* 



G(X r )dr-t / C((l - r)x + ry) dr 
11 Jo 



< 



< 



/V'-^-D-t) 



y/d5K G dr 



VdSK G t 



□ 



Proof of Theorem IJ.il Let P be the reference measure under which X t satisfies 

dX t = a dW t , X = x, < t < T 

where W t is a <i-dimensional Wiener process, and let E be the corresponding expectation. 
For y = (yi, y 2 , yd) G K d and h > define 

d 

S y ,h = Y[ [VkiVk + h) 



k=l 



Using Theorem 12.11 with V — yields 

F(X t e S y , h ) = 



E 



exp ( a' 2 [ V(x) - V(X t ) + -J (L v + L )V(X r ) dr ) ) l {Xi es y , h} 



such that under P, X t satisfies 

dX t = -W(X t ) dt + a dW t , X = x, < t < T 
with W t a (i-dimensional P- Wiener process. Now 

G Sy, h ) 



h d 



E 



exp (a' 2 (V(x) - V(X t ) + \ f*(L v + L )V(X r ) dr) l {Xt eS y , h 



(20) 
HX t G S yjh ) 



HX t G S Vth ) 

Taking limits in fl5Djt - fl5I]l as h -> gives 
Pt(x,y) = 



h d 



(21) 



E 



exp ( a' 2 ( V(x) - V(y) + -J (L v + L )V(X r ) dr 



X = y 



pt{y - x) 



The first statement of the theorem follows from Lemmas I3.4H3.5I with G = (Ly + Lq)V . 
The last statement follows by taking 5 = t a . □ 



The proof of Theorem 13.31 which is omitted, is similar to the proof of Theorem 13 . 1 1 and 
relies on the fact that the exit probabilities of the pinned diffusion of Lemma 13.41 retain 
the same asymptotics as t — > with the addition of a drift F (see Theorem 2.1 of [T5]). 

4. Importance sampling and exiting a well 

Here the problem of estimating a small escape probability F(A) of the process ([I]) is 
considered. In standard Monte Carlo, one estimates P(^4) by taking the average number 
of samples, out of some total N, for which the event A is observed. More precisely, the 
standard Monte Carlo approximation of F(A) is 

1 - 

( 22 ) 

n=l 

where 1\ are i.i.d. random variables with the same distribution (under P) as the indicator 
function 1^. This estimate has expected value 

E(0) = F(A) 

and variance 

Var(0) = — — — 

where 

Var(U) = F(A) - F(A) 2 (23) 
The relative error of O^ is its standard deviation divided its expected value: 

Relative Error(6) = jji7~^~ ~ = 

The relative error blows up for fixed N as F(A) — > 0, making the estimate (T221 useless 
for a fixed computational effort if F(A) is very small. 

An alternative to standard Monte Carlo sampling is importance sampling (see e.g. 
[TE] . [T7]). in which one chooses another probability measure P for sampling. One then 
estimates P(*4.) by taking the average number of samples (out of N) for which A has 
occurred, such that each sample is weighted by the factor cflP/oflP. More precisely, an 
unbiased importance sampling estimator for F(A) is 

n=l x ' 

where V\ are i.i.d. random variables with the same distribution (under P) as I4. Here P 
must be absolutely continuous with respect to P. is called unbiased because 




F(A) = E [U] = E 
which implies 

F(A) = E [0] = E 



oT 

— ~ l^i 
aT 
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where E is the expectation corresponding to P. To optimally reduce the number of 
samples necessary to achieve a given error, one wants to minimize the variance 



Var(G>) 



1 ~ fdF 
— Var 1 A 



subject to constraints of feasibility. Here 

E 



Var I Si, 

.dF 



dF , 

s 1 '• 



- F(Af 



E 



'dF 

— ~ 1.A 

dF 



F(Af 



(25) 



One would hope, for instance, that the variance is greatly reduced compared with stan- 
dard Monte Carlo, that is, 



Var(6>) 



Var MS 1 



Var(6) Var(l^) 
Another important quantity is the relative error 



< C« 1 



(26) 



Relative Error(0) = 



Var(9) 
F(A) 



1 

7n\ 



E 



F(Ay 



- 1 



To minimize the relative error, one wants to minimize the quantity 



F(Ay 



(27) 



(2? 



In general it is very difficult to prove an inequality like ( 126]) . or useful bounds on ( 127]) - 
( 128]) . outside of certain asymptotic regimes. Examined below is the small noise regime 
of the overdamped Langevin equation, defined by 

dXl = -VV(X^) dt + yfedWt 

where e is a small parameter. The small noise regime can be thought of as a nearly 
deterministic version of the SDE, where the dynamics are dominated by the potential 
energy and diffusive effects are small. 

In the below the reduction in variance from using §M§ instead of (122!) for estimating 
probabilities in the small noise regime of the overdamped Langevin SDE is considered. 
Though the analysis is restricted to the small noise overdamped Langevin equation, the 
method itself is applicable to the second-order Langevin equation. The scheme involves 
only changes in measure P — > P corresponding to a fixed change in the potential V — > V. 
That is, the sampling measure P will correspond to the process 

dX\ = -VV(X e t ) dt + VedW t , X = x , < t < T (29) 

whereas the target measure P will correspond to 

dXl = -VV{X e t )dt + y/edWt, X = x 0l 0<t<T (30) 

Here and throughout the dependence of P and P on e is suppressed. In Theorem 14.21 it 
is shown that for estimating the probability of escaping a potential well, an exponential 
reduction in variance (compared to standard Monte Carlo) can be achieved simply by 
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taking a sampling potential V which reduces the depth of the well. The magnitude of 
the reduction in variance is closely related to the difference V(xq) — V^o)- The scheme 
allows for the well to be "inverted," and in fact in Theorem 14.31 it is shown that under 
certain conditions this creates an asymptotically optimal reduction in variance, in the 
sense that [7] 

lime log A = (31) 

e-»0 

The limit in fl3Tj) is optimal because for any P, P, and A, Jensen's inequality and ( 125]) 
imply A > 1. 

Below A e is written for A defined in ( 1281) . and e , e are written for 0, defined in 
(J22J), (I24"|) . to emphasize the dependence of these objects on e. Events of the following 
type will be considered: 

Definition 4.1. Let D C FL d be a bounded open set such that 3D is a simple closed 
curve, and define 

A e = {X e : X € T £D} 
The following is the main result of this section. 

Theorem 4.2. Assume V G Cf(M d ) ; V G C%(R d ), such that: 

(i) V(x ) < V{x ) 

(u) \VV{x)\ < \VV{x)\ for all x E D 
(Hi) V(x) = V(x) for all x £ D 



and define 



M= ^sup [AV(x) - AV(x] 

2 xS-D 



Let P be the reference measure under which X\ satisfies (]29l) . and define P as in (E 
(The dependence ofF andF on e is suppressed.) Then under F, X\ satisfies fl30|) . If 

V(x ) - V(x ) > eTM 

then 



Furthermore 
where 



Var (Q^) < e-^VM-Vixo^+TM 

Var(0 e ) - 
lime log A £ < V(x ) - V(x ) + Iv(xo) 

e-5>0 



I v (x ) =mi\- <P(t) + W(0(t)) dt : G < >T , (j>(T) i D 



(32) 
(33) 



with 



U xo,T 



: [0,T] -> R d : (j)(t) = x + / 4>(s) ds, / \(p(t)\ 2 dt < oo 



li 



Proof. By Theorem 12.11 and assumptions (ii)-(iii) 



E 



< e e 



- 1 (V(x )-V(x ))+TM^ > ^ 



(34) 



Using assumption (i), choose e > so that 

V(x ) - V(x ) > eTM 

Then from (1231, (125) and (12Ej). 

Var(9 e ) _ V ^ (f U 
Var(6 e ) 

Comparing with ( 13~4"|) with ( 1331) . 

Var(6 e 



) E 


cflP -I 


J < 





Var(6 e 



Var(U) " P(A) 



(35) 



From Definition 14.11 and continuity of W it follows that A e is a continuity set [5] with 
respect to the rate function 



i r T 

2 



J T 0(t)+W(0(t)) 



(it, 



oo, 



Therefore 



lime log P(A) = -Iv( x o) 



A simple calculation now leads to (|32|) . 



□ 



Theorem 14.21 shows that by choosing an sampling potential V which reduces the depth 
of the potential well around xq and which agrees with V outside the well, the probability 
that the process (130]) is outside the well at time T can be estimated with an exponentially 
reduced variance compared to standard Monte Carlo. The variance is reduced by a factor 
proportional to 

exp [e- 1 (y(ar ) - V(x )) 

See Figure 1. 

In Theorem 14.31 below is shown that if the well has a flat boundary, then an asymp- 
totically optimal scheme is obtained by inverting the potential well inside D; see Figure 
2. 

Theorem 4.3. Assume V G C%(M. d ) and that W(x) = on 3D. Then WLOG we may 
take V(x) = on dD. Define 



V{x) 



— V{x) if x e D 
V{x) ifx<£D 



12 




a 



Figure 1 . Using the sampling potential V to estimate the rare observable 
P(X^ ^ D), for D = (a,b). The logarithm of the reduction of variance 
compared to standard Monte Carlo is closely related to the length of the 
vertical line. 



and assume V G C 2 (R ). Define 

K = sup |AV"| = sup |AV| 

xeD x£D 

Let P be the reference measure under which XI satisfies (|29|) . and define F as in (151), 
so that under P, X\ satisfies (J30l) . (The dependence of P and P on e is suppressed.) 
Furthermore assume the solution y = y(t) to the I VP 



satisfies y(T) D. Then 



^L = -VV(y), y(0) = x 



lim e log A e = 

e->0 



(36) 



Proof. From the proof of Theorem 14.21 

limelogAe < 2V(x ) + Iv(xq) 

= 2V(x ) -HmelogP(A 

e->0 

Now Theorem 12.11 gives 

P(A) > e 2e " ly(xo) - TX P(A) 
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(37) 
(38) 

(39) 



4- 

a 



x o 



-J- 
b 



Figure 2. Using the sampling potential V to estimate the rare observable 
P(X^ £D), for D = (0,6). If X° ^ D (that is, if the process lands 
outside of D with deterministic dynamics) then the reduction in variance 
is asymptotically optimal. 



From Definition 14.11 and continuity of VV^ it follows that A e is a continuity set with 
respect to the rate function 

2 



1 r T 

2 



00, 



Therefore by assumption (|3"5]) 
From f )39|) it follows that 



limelogP(A) = -Iv(xo) = 



lime log P(A) > 2V(x ) (40) 

e->0 



By comparing (140]) with (j37j)-(j38|) 
From Jensen's inequality A e > 1 and the result follows. □ 



lime log A e < 



Although the assumptions of Theorem 14 . 3 1 are very restrictive, the result suggests what 
changes in potential should be most effective more generally. In particular, by inspecting 
(EJ) and the proof of Theorem 14.21 one sees that in choosing an optimal V, there is a 
competition between maximizing V(xq) while also minimizing |W(x)| and maximizing 
AV(x) for x G D, in the sense that when any two of these three is fixed, optimizing 
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the third reduces the variance. Here it is assumed that V agrees with V outside D. 
Theorem 14.21 shows that in the small noise limit, maximizing AV(x) becomes irrelevent; 
Theorem 14.31 suggests that it may be near optimal to choose a V which maximizes V(xq) 
while also minimizing HVV^x)! — |VV(x)|| for x G D. 



5. Example 

Consider the one-dimensional overdamped Langevin SDE 



dX t = -—V(X t )dt + dW t , x 



0, < t < T 



(41) 



where V(x) = — cosx — 1 and W t is a Wiener process. Let T — 1, define 

A={X. : X T i (-tt.tt)} (42) 

and suppose the probability of interest is ¥(A) where P is the probability of the process 
( |4~TT) . Consider the importance sampling scheme of Section HJ with sampling potentials 



V A (x) 




V B (x) 



if X G (— 7T, 7r) 

otherwise 

— V{x), if X G ( — 7T, 7r) 

V(a;), otherwise 



Let P A be the reference measure under which X t satisfies 

d ~ 



-^V A (X t ) dt + dW. A , X = x , 0<t<T 
dx 



and let P s be the reference measure under which X t satisfies 
dX t - " "~ L! > v ^ ,! ;TI/B 



-^-V^ B (X t ) dt + X = x , < t < T 

dx 



(43) 



(44) 



where dW A and (iW^ 5 are F A - and P B - Wiener processes, respectively. Then under P 
defined by (jSJ), X t satisfies (j4Tp . The following table compares standard Monte Carlo 
estimates of P(*4), using defined in f )22|) . to estimates from the scheme outlined in 
Section HI The importance sampling estimators G A and Q B corresponding to P A and 
¥ B are defined as in ff24|) . Samples are obtained using Euler approximations of (|4ip and 
E3"]) - (l44p with step size h = 10~ 5 and Riemann approximations of (JSJ) with mesh size r. 



estimator 


potential 


N of samples) 


r 


sample average 


sample variance 


e 


V 


10" 


N/A 


0.000192 


0.000192 


@A 


V A 


10 7 


io- 1 


0.000195 


0.000022 


@A 


V A 


10 7 


io- 2 


0.000193 


0.000022 


QA 


V A 


10 7 


io- 3 


0.000193 


0.000022 


QB 


V B 


10 7 


io- 1 


0.000204 


0.0000039 


QB 


V B 


10 7 


io- 2 


0.000196 


0.0000039 


QB 


V B 


10 7 


IO" 3 


0.000197 


0.0000039 
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Note that sampling with either V A or V B reduces the variance significantly. The greater 
reduction in variance is obtained by sampling with V B . This is consistent with Theo- 
rem I4.3[ which suggests V B is asymptotically optimal. Note that V A and V B do not 
quite satisfy the conditions of the theorem since 

do not exist, yet the scheme is nonetheless accurate and effective. One suspects the 
assumption V G C^(M. d ) in Theorem 14.21 and Theorem 14.31 can be relaxed to V G C^(IR d ) 
on dD; this generalization is not pursued here. Though e = 1 here is "far" from zero, one 
expects that P(A) being small means exactly that one is effectively in the small noise 
regime. 



6. Conclusion 

The problem of estimating small probabilities of the overdamped Langevin process, a 
well-known and important model of physical systems, is explored. Since standard Monte 
Carlo techniques are often impractical in this setting, it is useful to have alternative means 
of estimating averages of observables, in particular of transition probabilities. This paper 
explores small transition probabilities of Langevin processes in two asymptotic regimes: 
£ m and ft = 2e _1 ~ oo. A first-order accurate asymptotic correction to transition 
probability densities as t — > is proved in Theorem 13.11 and an importance sampling 
technique for estimating escape probabilities as /3 — > oo is shown in Theorem 14.21 to 
perform exponentially better than standard Monte Carlo. It is shown that this technique 
is asymptotically optimal in some cases (Theorem l4.3p . The importance sampling scheme 
has the virtue of requiring nearly neglibible added computation (compared with standard 
Monte Carlo) during simulations, and it is shown to be effective in a simple numerical 
example. 
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